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The Validity and Reliability of Concept Mapping as an Alternative 
Science Assessment when Item Response Theory is Used for Scoring* 

Introduction 

Concept mapping as an alternative science assessment has been discussed 
intensively in the literature. Concept mapping is a technique used to represent the 
relationships between concepts in a two-dimensional graph. It was originally used by 
Novak and his colleagues (Novak and Gowin, 1984) as an instructional and assessment 
tool for science learning during the 1970'r Concept mapping has been primarily used as a 
diagnostic tool to assess students' conceptions (Moreira, 1985; Ross and Mundy, 1991; 
Wallace and Mintzes, 1990). More recently, concept mapping has been used as an 
alternative science classroom achievement assessment. For example, Gaffhey (1992) used 
concept mapping to evaluate students' achievement on botany and natural communities in 
a fifth grade class. Tippins and Dana (1992) used concept mapping as a culturally relevant 
assessment. The use of concept mapping for assessing learning processes has also been 
reported. Fleener and Marek (1992) used concept mapping to assess student's learning in 
the three phases of a learning cycle (exploration, conceptual invention, and expansion). 
Roth (1992) also used concept mapping to assess student's learning/in v^stigation process. 
The comprehensive use of concept mapping in designing instruction and assessment has 
been reported by Barenholz and Tamir (1992). 

Although considerable effort in concept mapping as an alternative assessment has 
been made as reviewed above, the empirical findings on the validity and reliability of using 
concept mapping as an alternative achievement assessment are very preliminary and far 
from conclusive. In Liu's (1993) study, students 1 concept mapping scores correlated 
significantly with students' scores on the conventional tests. This result is consistent with 
other studies. For example, Bousquet (1982) found that concept map scores could predict 
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students 1 achievements in a college natural resources class. Fraser and Edwards (1985) 
found that student* who scored at high levels on end-of-unit tests showed high levels of 
concept mastery as indicated by the concept maps they made. However, opposite 
conclusions about the prediction validity of concept mapping have also been reported. For 
example, a poor correlation between students 1 concept map scores and their scores on 
standardized tests was reported by Novak, Gowin and Johansen (1983). In Trigwell and 
Sleet (1990), it was also found that concept mapping scores had a low correlation with 
traditional examination scores. 

The diversified conclusions about the predication validity of concept mapping 
reported may be due to the different scoring schemes used. There are various scoring 
schemes of concept maps reported in the literature, such as that in Cleare (1983), in 
Novak and Gowin (1984), in Schreiber and Abegg (1991), in Vargas and Alvarez (1992), 
and In Wallace and Mintzes (1990). The scoring schemes proposed so far are based on 
the evaluation of concept map aspects, such as the number of correct links, hierarchies, 
cross-links and examples. For example, Novak and Gowin (1984) proposed to measure 
valid links (1 point each), valid hierarchies (5 points each), valid cross-links (2 or 10 points 
each depending on whether or not the cross-link is significant), and valid examples (1 
point each). Schreiber and Abegg's (1991) scoring scheme includes the hierarchical 
structure of a concept map, identified propositions, and th-* actual validity versus implied 
validity of concept map components. The overall score of a concept map is defined as 



X = [x-n(b+c)] + b/c f (1) 

Where 

X\s the overall concept map score; 

x is the initial tally of points (ratios) awarded for recognition of hierarchical, 
propositional and valid constructs on a concept msp: 
n is the number of strands in a concept map; 
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b is the summed ratios of number of vocabulary terms to number of hierarchical 
levels (per strand); and 

c is the summed ratios of number 01 valid connecting lines to total number of 
connecting lines drawn. 

Because different scoring schemes emphasize different concept map aspects and award 
different weights to concept map aspects, a same concept map produced by a student may 
be given different concept map scores under different scoring schemes. 

Reliability is another issue which has not been addressed intensively in the 
literature. Although the internal consistency among valid links, valid hierarchies, valid 
cross links and valid examples when using Novak's scoring scheme was fairly high (.65) 
according to Liu's (1993) study, the inter-rater reliability has not been reported in the 
literature. A low inter-rater reliability may be expected. One reason for the expected low 
inter-rater reliability is that, in students 1 concept maps, some links and cross-links may be 
connected without linking words. This situation is common in novices' concept maps. 
Therefore, it is difficult to judge a link to be correct or incorrect since the correctness or 
incorrectness may depend on the assumption made by the rater that what linking word 
might be implied by the student. A more fundamental reason for the expected low inter- 
rater reliability is that students' conceptions as demonstrated by links, hierarchies, cross- 
links and examples in a concept map are intrinsically difficult to judge as being totally 
incorrect or totally correct, because many studies have showed that students conceptions 
may make sense in some aspects but may not be completely consistent with scientific 
views. Therefore students' conceptions may be better considered along a continuum from 
nonsense to scientific conceptions (Driver and Erickson, 1983; Driver and Bell, 1986; 
O'Loughlin, 1992; etc.). The above discussion is not to object to the possibility that a high 
inter-rater reliability may be achieved if an intensive training of raters is provided and 
sufficient discussion among raters is allowed. 
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The present study addresses the above validity and reliability problems by 
employing Item Response Theory (IRT) models for scoring. IRT is a mathematical 
attempt to model students' responses to test items into item characteristics (such as item 
difficulty) and students' abilities. Two advantages have been claimed for IRT applications: 
item parameter estimates are invariant from the sample used to calibrate, and ability 
parameter estimates are invariant from the test used to calibrate. Thus, when IRT models 
are used for scoring, students' achievements can be compared even if they do not write the 
same test (Hambleton, 1989). Samejima (1969) extended the traditional IRT models to 
graded IRT models, thus students 1 responses to an item no longer have to be scored 
dichotomously (as right or wrong), they can be graded as categories. When IRT is used 
for scoring concept maps, the four aspects of a concept map, links, hierarchies, cross-links 
and examples, are considered as "test items", and the numbers of links, hierarchies, cross- 
links and examples are considered as students' categorical responses to the "test items". 
Therefore, by applying graded IRT models to students' responses, it is possible to obtain 
students' ability estimates. 

IRT scoring emphasizes the overall structure of students' concept maps, instead of 
the correctness of a specific concept map aspect. In this study, the overall structure of 
students' concept maps are defined by the number of links, the number of hierarchies, the 
number of cross-links and the number of examples. The analysis of structural 
characteristics of students' concept maps was reported by Wilson (1993). In Wilson's 
study, a 24 x 24 matrix representing the inter-relationships between the 24 concepts 
provided for concept mapping was created. The matrix was defined by whether or not a 
connection between the two concepts existed. By applying non-parametric 
multidimensional scaling, the coordinates on the three dimensions were obtained. The 
canonical correlation between the coordinates and students' conventional achievement test 
scores was found to be significant. However, Wilson's study did not provide a scoring 
scheme based on the structural characteristics of concept maps. This study proposes a 



scoring scheme according to the structural characteristics of concept map and studies the 
validity and reliability of this scoring scheme. 

Methodology 

Procedure 

The following procedures were used to obtain students 1 ability estimates and item 
characteristics: 

1. each concept mapping task is considered as a test which contains four "test 
items 11 : links, hierarchies, cross-links, and examples; 

2. students 1 concept maps are measured by the number of links, number of 
hierarchies, number of cross-links, and number of examples. Those numbers are 
categorical responses to the "test items" in procedure 1; 

3. apply Samejima's graded IRT model to students 1 categorical responses to obtain 
students 1 ability estimates and characteristic estimates of "test items". 

The software used to estimate students' abilities and item characteristics based on 
Samejima's graded IRT mode! was MULTBLOG (Thissen, 1991). MULTELOG has been 
widely used for categorical IRT analysis for years. 

Two aspects of validity, construct validity and consequential validity, were studied 
following the conceptual analysis by Messick (1989). As for the construct validity, two 
aspects were examined: internal construct validity was assessed by examining the difficulty 
and discrimination of the four "test items" and by examining the inter-relationship between 
the four "test items"; external construct validity was assessed by examining the 
relationship between students' IRT ability estimates and students' concept mapping scores 
according to Novak's scoring scheme and by examining the relationship between students' 
IRT ability estimates and their scores on the conventional tests. 

The reliability in IRT applications is defined by Standard Error of Estimation 
(SEE) when maximum likelihood estimation is employed. Since SEE is defined at each 
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ability estimation level, an average SEE over all ability estimates can be calculated as the 
SEE for a test. Based on the average SEE, the reliability of a test can be calculated 
according to the conventional formula defined as 

Pxx=l-a e 2/a x 2, < 2 > 

where 

Pxx * s reliability of the test; 

o e is the standard error of estimation (SEE); and 

o x is the st andard deviation of students' ability estimations. 

Data source 

This study was conducted in four classes at a junior high school in a Canadian 
Atlantic province. Four grade 7 general science classes taught by two teachers 
participated in the study. In this school, students are randomly assigned to classes after 
the top 60 high achievement and bottom 30 lower achievement students are assigned. The 
two classes of the top 60 higher achievement students are called enriched classes, the class 
of bottom 30 lower achievement students is called an adjusted class, and the classes 
randomly formed are called regular classes Among the four classes participating in this 
study, one was an adjusted class, one a regular class and two were enriched classes. 

Concept mapping technique was introduced to the classes by the two teachers 
according to the procedures outlined in Novak and Gowin (1984) at the beginning of the 
term. The two teachers are very familiar with concept mapping and have used concept 
mapping in their instruction for years. The data used in this study was from an end-of-the- 
unit test administered toward the end of the first term during the academic year when the 
students had grasped basic concept mapping techniques. After finishing the unit, the 
teachers gave the classes a conventional test as before, and also a concept mapping test. 
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The concept mapping was administered based on a list of concepts provided by the 
teachers. The list of concepts was identified from the conventional test and students were 
asked to use some or all of the concepts provided to draw concept maps. Students might 
also use any concepts not included in the lists. The conventional test took one class 
period (45 minutes) and concept mapping took one class period as well. Figure 1 is a 
sample student's concept map. From the existence of cross-links and frequent usage of 
linking words, we may infer that the student grasped concept mapping techniques quite 
well. Students' concept maps were then evaluated by the numbers of links, hierarchies, 
cross-links, and examples, and those numbers constituted the students 1 responses to the 
four "test items": links, hierarchies, cross-links, and examples. Samejima's graded IRT 
model was applied to analyze students 1 responses. 



Insert Figure 1 about here 



Results 

Valid ity 

Internal validity 

Columns 1 to 4 in Table 1 list numbers of links, hierarchies, cross-links and 
examples in students 1 concept maps. From the bottom of Table 1 (Mand 5D), we know 
that the average number of links is 22, the average number of hierarchies is 6.2, the 
average number of cross-links if * 1, and the average number of examples is 2.2. Also 
from the table, a higher variation can be observed in the number of links and the number of 
examples in students 1 concept maps. The inter-correlation between the numbers is listed in 
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Insert Tables 1 and 2 about here 



Table 2. From Table 2, it can be seen that the number of links, the number of hierarchies 
and the number of cross-links are significantly correlated with each other, but the number 
of examples is not significantly correlated with any of the numbers. 

Since MULTBLOG can only process a maximum of 10 categories, but the 
maximum numbers of links, hierarchies and examples are 39, 14, and 21 respectively; a 
transformation was conducted before applying MULTILOG to the data file. The rationale 
for the data transformation was that students' categories would be extensively distributed 
between 0 and 10 so that a higher discrimination power could be expected. Based on this 
rationale, the numbers of links were divided by 4 and rounded to the nearest whole 
number. All the numbers of hierarchies and examples greater than 10 were re-coded as 10 
(the highest category). Only 6 out of 92 students had a number of hierarchies greater than 
10, and only 4 out of 92 students had a number of examples greater than 10. After 
applying MULTILOG to the transformed data file, the a and b parameter estimates for 
each "test item" (links, hierarchies, cross-links and examples) were obtained and are listed 
in Table 3. 

In Same].- \a's graded IRT model, an Item Characteristic Curve (ICC) is defined as 



v ' l + «p[-a(0-Vi)l l + exp[-a(0-*J]' 

where 

k is the category an examinee responds to an item, k = 1, 2, . .., m, m is the highest 
category; 

9 is an examinee's ability; 

a is the slope which defines the discriminating power of an item; 
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bj is the threshold which defines the difficulty of an item at category tc y 

p(x=k) is the probability of an examinee with ability 0 answering an item of 

discrimination power a and difficulty bj with category k. 

From Table 3, it can be seen that the number of links, the number of hierarchies, 
and the number of cross-links have a relatively high discriminating power (>1 .0), but the 
number of examples has a relatively low discriminating power (<1.0). It can also be seen 
that, for the numbers of rnks and hierarchies, the mean 0(which is -0.256 from Table 1) is 
between the difficulties of category 5 and category 6, meaning that an average ability 
student is likely to have 20 to 24 (5 x 4 to 6 x 4) links in their maps. Similarly, an average 
ability student is likely to have 4 to 5 hierarchies, 1 to 2 cross-links and 2 to 3 examples. 
The marginal reliability for the estimation is .78, and the negative twice the loglikelihood is 
1 1.4, indicating that the graded IRT model fits the data quite well. 



Insert Table 3 about here 



External validity 

Columns 5 to 8 in Table 1 list the number of valid links, the number of valid 
hierarchies, the number of valid cross-links, and the number of valid examples. Also in 
Table 1, students' ability estimates after applying MULTILOG are included in column 9. 
By employing Novak's scoring scheme, i.e. awarding each valid link 1 point, each valid 
hierarchy 5 points, each valid cross-link 10 points, and each valid example 1 point, the 
total concept mapping scores were calculated. The inter-correlation between the IRT 
ability estimates, the number of valid links, the number of valid hierarchies, the number of 
valid cross-links, the number of valid examples, and the total concept mapping scores 
were computed, the correlation matrix is listed in Table 4. 
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From Table 4, it can be seen that the IRT ability estimates are significantly 
correlated with the number of valid links, the number of valid hierarchies, the number of 
valid cross-links, and the total concept mapping scores. The total concept mapping scores 
are significantly correlated with the number of valid links, the number of valid hierarchies, 
and the number of valid cross-links. 



Insert Table 4 about here 



Also included in Table I are students conventional test scores (column 10). The 
inter-correlation between students' IRT ability estimates, their conventional scores, the 
number of links, the number of hierarchies, the number of cross-links, and the number of 
examples was computed. The correlation is included in Table 2. From Table 2, it can be 
seen that students 1 ability estimates are significantly correlated with their conventional test 
scores, with number of links, with number of hierarchies, and with number of cross-links. 
The correlation between students 1 ability estimates is not significantly correlated with the 
number of examples. From Table 2, it can also be seen that students conventional test 
scores are significantly correlated with the number of cross-links, in addition to a 
significant correlation with their IRT ability estimates. 

The inter-correlation between student conventional test scores, the number of valid 
links, the number of valid hierarchies, the number of valid cross-links, the number of valid 
examples, and the total concept mapping scores was also computed, and the correlation is 
listed in Table 5. 

From Table 5, it can be seen that students 1 conventional test scores are significantly 
correlated with student total concept mapping scores and the number of the valid cross- 
links. 
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Insert Table 5 about here 



In order to study the effect of different student groups, another MULTILOG 
application was conducted by dividing the examinees into three different groups: enriched 
class, regular class, and adjusted class. The inter-correlation matrices for the three groups 
are listed in Tables 6 to 8, From Tables 6 to 8, it can be seen that students' IRT ability 
estimates are not significantly correlated with their conventional test scores. 



Insert Tables 6-8 about here 



Consequential validity 

When IRT is used for scoring concept maps, the immediate advantage is to free 
teachers from the uncertainty about whether or not a proposition in a student's map is 
correct or incorrect, it is straight forward to count the numbers of links, hierarchies, cross- 
links and examples. Concept map scoring time will also be reduced when IRT is used for 
scoring, it is possible to score students' concept maps at a rate of one map per minute. 
Teachers' preparation time for tests will be reduced as well when concept mapping is used 
as an alternative assessment. Teachers only need to provide a list of concepts, and 
students feel free to add any concept not provided in the list. 

As for students, IRT scoring could make adaptive testing possible. Theoretically, 
the difficulty of a concept mapping test is appropriate for any student with any level of 
ability. In this sense, a concept mapping test is adaptive to students' ability levels. 
Students also feel less intimidated on a concept mapping test, since it allows students more 
freedom to construct and express their conceptions. 
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A criticism of conventional concept mapping scoring is that students' scores are 
concept-dependent. IRT concept mapping scoring provides a concept-free ability 
estimates. For example, one group of students concept mapping scores on Acid and Base 
will hardly be compared with another group of students concept mapping scores on Forces 
when Novak's scoring scheme is used, because their concept maps are based on different 
concepts. If IRT is used for scoring, two groups of students can be compared by 
estimating both students' abilities concurrently. In this situation, there will be eight "test 
items", four for each group. The categorical responses to the four "test items" not 
responded to by each group are coded as "Not-reached". The final ability estimates for 
both student groups will be on the same scale and therefore can be directly compared. 

The practical difficulty for using IRT as scoring is that an IRT parameter 
estimation program and computer are necessary. IRT parameter estimation programs, 
such as MULTILOG, cost a few hundred dollars. The computer skills needed to prepare 
and run an IRT parameter estimation program are minimum, a few step by step 
instructions will do the job. IRT parameter estimation usually entails a computer of at 
least 386, although a slower computer such as 286 may also work if it is not rush to have 
results, as is the case for most classroom assessments. Most schools have been equipped 
with at least one IBM compatible computer. 

The meaning of IRT ability estimates can hardly be understood immediately. For 
example, how to interpret an ability estimate of -.005? We have been using percentage 
scores for decades and refer percentage scores to the percentages of items answered 
correctly when multiple choice items are used. IRT ability estimates are not on a ratio 
scale, at best on an interval scale, this is the same as percentage scores. The IRT ability 
estimates may be better referred to proficiency levels as suggested by Hambleton (1991). 
It is possible to establish a transformational relationship between IRT ability estimates and 
conventional test scores such as percentage scores. Examples of transformation are those 
in Woodcock (1978) and Wright (1977). 




Reliability 



The last column in Table 1 lists the Standard Error of Estimation for all the 
abilities estimated. The average standard error of measurement for the test is calculated as 
follows 



where oj is the standard error of estimation for a student's IRT ability estimate. 

By substituting the values in Table 1 to formula 4, the average standard error of 
measurement for the test was calculated as .635. By substituting the average standard 
error of estimation (.635) and the standard deviation of IRT ability estimates (1.653) into 
formula 2, it can be calculated that the reliability of the test is .85. 



The results presented above show that IRT scoring of concept maps is generally 
valid and reliable. The correlation between IRT ability estimates and the total concept 
mapping scores based on Novak's scoring scheme is significant. The significant 
correlation between IRT ability estimates and students' conventional test scores is also 
consistent with the results when Novak's scoring scheme was used. This demonstrates 
that it is a valid approach to score students' concept maps based on the structural 
characteristics as defined by the numbers of links, hierarchies, cross-links, and examples. 
The advantage of IRT scoring is the reliability. As discussed above, the reliability is as 
high as .85. This reliability level should be sufficient in most classroom testing situations. 

The number of examples in the scoring scheme does not contribute to the validity 
very much. It does not have a high discrimination power as indicated in Table 3. It is not 
significantly correlated with numbers of links, hierarchies and cross-links. This indicates 




(4) 



Discussion 
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that the inclusion of the number of examples in the scoring scheme does not contribute 
significantly to the internal validity. The numbers of examples is not significantly 
correlated with students 1 IRT ability estimates, nor with their conventional test scores. 
Even in Novak's scoring scheme, the number of valid examples is not significantly 
correlated with the total concept mapping scores. This indicates that the number of 
examples in the scoring scheme does not contribute significantly to the external validity. 
The number of examples and valid examples in Table 1 r iow that there is a high variation 
in the number of examples in students 1 concept maps. Most students do not have any 
examples in their concept maps at all, while a few have more than 10 examples. This 
seems to indicate that using examples in concept mapping may not be a stable 
characteristic of students' conceptual understanding. It may also be possible that the use 
of examples is closely related to the topic of the concept mapping, i.e., some topics may 
entail more examples and some topics may entail fewer examples. If this is the case, the 
universality of examples as a characteristic of the students 1 conceptual framework is 
questionable. 

Although there is a significant correlation between IRT ability estimates and 
students' conventional test scores, this significant correlation does not exist within specific 
student groups such as enriched, adjusted and regular classes. This seems to suggest that 
concept mapping scores can not predict students' conventional test scores if the student 
group is sufficiently homogeneous. In Table 1, students 1 to 59 are enriched class 
students, students 60 to 66 are adjusted class students, and students 67 to 92 are regular 
class students. From the distributions of student conventional test scores and IRT ability 
estimates, we can see that the variation in student conventionai test scores in each group 
appears to bt> less than that in IRT ability estimates in each group. It seems that the 
conventional test is not as discriminating as concept mapping is. These results may 
explain the insignificant correlation between concept mapping scores and conventional test 
scores reported before (such as Novak, Gowin and Johansen, 1983; Trigweil and Sleet, 
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1990). If concept mapping is more discriminating, a sensible hypothesis is that concept 
mapping as an alternative science assessment may be more appropriate in large scale 
assessment when students are more heterogeneous if prediction validity is of interest. Of 
course, the insignificant correlation is also possibly due to the hypothesis that concept 
mapping and conventional tests assess different aspects of knowledge as suggested by 
Trigwell and Sleet (1990). 

A complete computer package including concept mapping facilities and IRT 
scoring may be more convenient for classroom use. Currently, a few concept mapping 
computer packages, such as SemNet (Fisher, 1990) and Inspiration by Inspiration 
Software Inc. on Macintosh, are available. IRT parameter estimation programs, such as 
MULTILOG and ManyFacet developed at University of Chicago, have been used on IBM 
compatibles for years. An integrated system of concept mapping and IRT scoring on 
popular IBM compatibles will be much more convenient. Once this system is available, 
the test will be much more flexible than it is now. For example, a student may do concept 
mapping test any time he/she likes: in school or at home, during daytime or in the evening, 
because the concern for the confidentiality of concept mapping tests is much less than in 
traditional testing situations. 



♦The author sincerely thanks the participating teachers, Mr. Mike Hinchey and Ms. Leona 
Williams, for their help with the data collection. This study was made possible by a 
research grant (#UCR192) from St. Francis Xavier University . 
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Table 1 

Structural characteristics of students' concept maps: number of links/valid links 
(cl/c5), number of hierarchies/valid hierarchies (c2/c6), number of cross-links/valid cross- 
links (c3/c7), number of examples/valid examples (c4/c8), IRT ability estimates (c9), 
conventional test scores (clO), and standard errors of estimation (cl 1) 
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Table 2 

Correlation between the number of links (cl), number of hierarchies (c2), number 
of cross-links (c3), number of example. (c4), IRT ability estimates (c5), and conventional 
test scores (c6) (n=92) 
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c2 
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c5 


0.839* 


0.733* 


0.356* 


0.097 




c6 


0.130 


0.077 


0.315* 


0.086 


0.276* 



*p<.05 (two tails) 
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Table 3 

Item characteristics 





Links 


Hierarchies 


Cross-links 


Examples 
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1 88 
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0.44 


b(l) 


-5.86 


-6.18 


- 14.03 


- 12.91 


b(2) 


-1.89 


-2.40 


0.37 


- 2.68 


b(3) 


-1.37 


-1.45 


1.02 


0.03 


b(4) 


-0.73 


-0.B9 


2.50 


2.58 


b(5) 


-0.34 


-0.24 


18.07 


4.85 


b(6) 


0.57 


0.27 


1.10 


5.83 


b(7) 


1.17 


0.95 


- 9.98 


7.35 


b(8) 


1.58 


1.44 




13.14 


b(9) 


2.39 


1.84 




- 10.72 



26 



Table 4 

Correlation between students' ability estimates (cl), number of valid links (c2), 
number of valid hierarchies (c3), number of valid cross-links (c4), number of valid 
examples (c5), and total concept mapping scores (c6) (n=92) 
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c2 


.800* 
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.147 






c5 
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-.036 




c6 


.764* 


.800* 


.753* 


.696* 


.131 



*p < .05 (two tails) 
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Table 5 

Correlation between students' conventional test scores (cl)» number of valid links 
(c2), number of valid hierarchies (c3), number of valid cross-links (c4), number of valid 
examples (c5), and total concept mapping scores (c6) (n=92) 
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*p< 05 (two tails) 




Table 6 

Correlation between IRT ability estimates (c5), conventional test scores (c6), 
number of links (cl), number of hierarchies (c2), number of cross-links (c3), and number 
of examples (c4) for the enriched classes (n=59) 
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.555* 
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.433* 


.220 
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.872* 


.825* 


.449* 


-.001 




c6 


.114 


-.039 


.187 


.266* 


.084 



*p<.05 (two tails) 



Table 7 

Correlation between IRT ability estimates (c5), conventional test scores (c6), 
number of links (cl), number of hierarchies (c2), number of cross-links (c3), and number 
of examples (c4) for the regular class (n=26) 
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c2 
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.791* 


.732* 
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-.368 
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.095 


-.344 



*p<.05 (two tails) 




Table 8 

Correlation matrix between IRT ability estimates (c5), conventional test scores 
(c6), number of links (cl), number of hierarchies (c2), number of cross-links (c3), and 
number of examples (c4) for the adjusted class (n=7) 
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.0 


-.088 



*p<05 (two tails) 
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